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The Singular Value Decomposition is a matrix decomposition technique widely used in the 
analysis of multivariate data, such as complex space-time images obtained in both physical and 
biological systems. In this paper, we examine the distribution of Singular Values of low rank matrices 
corrupted by additive noise. Past studies have been limited to uniform uncorrelated noise. Using 
diagrammatic and saddle point integration techniques, we extend these results to heterogeneous and 
correlated noise sources. We also provide perturbative estimates of error bars on the reconstructed 
low rank matrix obtained by truncating a Singular Value Decomposition. 



In analysing large, multivariate data, certain quantities naturally arise that are in some sense 'self averaging', 
namely in the large size limit, a single data set can comprise a statistical ensemble for the quantity in question. 
One such quantity, the singular value distribution of a data matrix, is the subject of this paper. The Singular Value 
Decomposition (SVD) is a representation of a general matrix of fundamental importance in linear algebra that is widely 
used to generate canonical representations of multivariate data. It is equivalent to Principal Component Analysis 
in multivariate statistics, but in addition is used to generate low dimensional representations for complex multi- 
dimensional time series. One example is to generate effective low dimensional representations of high dimensional 
dynamical systems. Another example of curent interest is to de-noise and compress dynamic imaging data, in particular 
in the case of direct or indirect images of neuronal activity. 

For most of the above applications it is important to understand the properties of an SVD of a matrix whose entries 
show some degree of random fluctuations. This has been achieved to an extent in multivariate statistics, where the 
sampling distributions of quantities associated with the PCA are computed; however, the computations involved are 
difficult and exact distributional results are limited. 

In this paper, we use diagrammatic and saddle point integration techniques to obtain the densities of singular values 
of matrices whose entries have varying degrees of randomness. In particular, we study the problem in the asymptotic 
limit of large matrix size; this limit is well justified in realistic cases as will be described below. The density of SV's 
has been obtained before, with other techniques, for matrices with each entry independently distributed normally 
with indentical variances Jl|,||. We are able to obtain distributions for some more general cases where the variances 
are not equal and/or correlations are present between matrix entries. Our results have implications towards isolating 
random components from image time series. Also, these results help in understanding the effects of truncating the 
SV spectrum at a given point, a technique that is widely applied to remove noise from data. 

The SVD of an arbitrary (in general complex) p x q matrix (p > q) M. is given by M. = UAV', where the p x q 
matrix U has orthonormal rows, the qxq matrix A is diagonal with real, non-negative entries and the qxq matrix V is 
unitary. Note that the matrices MAi^ = UA 2 U^ and M. = WA 2 F are hermitian, with eigenvalues corresponding 
to the diagonal entries of A 2 and U and V the corresponding matrices of eigenvectors. Consider the special case of 
space-time data /(x, t). The SVD of such data is given by 

7(x,<) =^A„/„(x)a„(i) (1) 

n 

where I„(x) are the eigenmodes of the spatial "correlation" matrix C(x, x') = J^, J(x, t) I(x', t), and similarly a n (t) 
are the eigenmodes of the "temporal correlation function" C(t, t') — JT,* t)I(x-, f). If one considered the sequence 
of images as randomly chosen from an ensemble of spatial images, then C(x, x') would converge to the ensemble 
spatial correlation function in the limit of long times. If in addition the ensemble had space translational invariance 
then the eigenmodes / ra (x) would be plane waves e lk x , the mode number "n" would correspond to wavevectors and 
the singular values would correspond to the spatial structure factor 5(k). In general, the image ensemble in question 
will not have translational invariance; however the SVD will then provide a basis set analogous to wave vectors. In 
physics one normally encounters the structure factors S'(k) that decay with wave vector, and in the more general case, 
the singular value spectrum, organized in descending order, will show a decay indicating the structure in the data. 

We consider the case of a p x q matrix M = Mo + N where M o is fixed and the entries of ./V are normally distributed 
with zero mean. We consider below several cases of normal distributions for entries of N, including cases where there 
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are correlations between entries of N. Aio may be thought of as the desired or underlying signal; for an SVD to be 
useful, M.q should effectively have a low rank structure. 
It is convenient to work in terms of the resolvent 

G(z) = Tr[(z - M^M)- 1 ] = J2 tKi (2) 



where Q(z) is a complex function. The density of SV's is given by 



2A r i 

p(X) = y§(\- A„) = — lim Im g(\ 2 - ie) - Q{\ 2 - ie) (3) 

n 

We proceed by finding a closed equation for the average of Q{z) in the limit where p, q — > oo and Oi — > such that 
the variance of a matrix entry times q, a 2 q converges to a finite limit a 2 and p/q is held fixed. The density of states 
is expected to be a self averaging quantity. Thus although we compute results by averaging over the ensemble, we 
expect to be able to apply our results to the SVD of individual data matrices. 

To illustrate the method, consider the simplest case, where each element of the matrix is i.i.d. normally as Af(0, a). 
Since 

Q(z) = d z lndet(z - M^M) (4) 

the average of the resolvent over the probability distribution of the matrix M can be obtained from (lndct(z — 
M^M.)), which in turn may be computed using replicas. We introduce n replicas of q dimensional real vectors 
X a = (x a i, ..,x aq )(a = l,..,n). Consider the following identity 



z n = [ [ns =1 n2 =1 ] (e^(-f £ x ^ z - ^M)X a ) ) = A*( [det(* - M*M)] 



r > (5) 



One obtains the desired quantity from the above by taking n — > 0. Before computing the expectation over M.'vn.Z n , 
we decouple the term X^Ai^AiXa using the Hubbard-Stratanovich transformation by introducing another n vectors 
Y a = (y a i, ■■,Uap){ot — 1, •-,«.) which are p dimensional. Performing the average over A4, we obtain 

J a a/3 

Now, we decouple J2 a/3 XjX a YaYp by using two n x n matrices [Q a p\ an d [Rap] as follows: 

Z n = 2-™ 2 (^-)^+" 2 J [JJ dXdYdRdQ] 

exp(-^[^(lj(4^ - d 2 Q a p)X fj + Y^(S aP - iR af3 )Y + iQp a R al3 )] ) 

a/3 

The integral over R enforces the condition Q a p = Y a Yj ', giving us back the previous expression. 

At this stage, we are in a position to write Z n as an integral over Q and R only, since we can do the integrals over 
X and Y. 

Z n = 2-™ 2 (^-)-^+™ 2 f []JdRdQ] exp(-^[\ndct(z - t 2 Q) + - lndct(l - iR) + itr(QR)]) (7) 

Ideally one should take the n — > limit first and then let q — > oo. In order to be able to perform analytical 
computations, we have to take the limit in the reverse order . That this gives the correct answer is verified later by a 
direct diagrammatic method. 

When p,q — > oo, with p/q fixed, the integral is dominated by some saddle point. We take the replica diagonal 
ansatz, consistent with all the symmetries. 

Qa(3 = Q(z)6 al 3 (8) 
R af3 = iR(z)8 al 3 (9) 
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Then we have to minimize 



S(Q(z),R(z)) = ln(z - d 2 Q{z)) + ? ln(l - R{z)) + Q(z)R(z) 



with respect to Q(z) and R(z). Hence the equations, 

1 



1 



z-a 2 Q(z) a 2 



(10) 



(11) 



p/q 



i - R(z. 



= Q(z) 



Using the fact that [-nqS(Q(z),R{z))] and |, we get 



so that G(z) satisfies 



G{z) = (G(z)) 



G(z) 



z — a 2 Q(z) 



q-a 2 G{z) 



(12) 



(13) 



(14) 



This equation can also be obtained from a direct diagrammatic resummation of {Tr Z _^\ M j expanded in powers 
of - . The diagrammatic representation of these terms are shown in Fig. 1 . 



a ^ a a 

z 




FIG. 1. Diagrammatic representation of successive terms in the resolvent. 



We use Wick's theorem to take the average over M with 

(M la M jb ) = a 2 5 t] 5 ab (15) 

We have to concentrate only on the one-particle irreducible graphs, which give rise to the self-energy. The advantage 
of taking the large p,q limit is that we have to consider only planar diagrams J3|. The diagrams contributing to 
self-energy, in the large p,q limit are shown schematically in the Fig. 2. 
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^ ► 

r times, with full propagator. 

FIG. 2. The diagrams that contribute to the self energy in large p, q limit. 



Summing the geometric series in this limit, we obtain 



s 00 = i ~ 2nl w (16) 

1 — G l (j{z)lq 



G(z) = = ^— (17) 



The solution of this equation is 



and 



ttXIt 

for X m i n < X < X max and zero elsewhere. 



G(z) = 4- [-* 2 (p/q -l) + z± y/±peP - (z - a*(p/q + 1))2] (18) 



P(A) = ^^(aL~^)(A^ALJ (19) 



Amoi,m.n = &\J (p/q + 1) ± 2 VpAz = V^r^ (p + ?)/2 ± V?? (20) 
These results have been obtained by various authors (see for example Q). 

Generalising our methods, both the saddle point technique and the perturbative method, to the following cases is 
quite easy. 
Case 1) 

(M ia ) = Ml (21) 

The matrix M° has singular values Ao a wherea = 1, • • • , q. 
The covariance matrix is given as before by 

((Mia - Ml)(M ob - M%)) = <T 2 6 l3 5 ab (22) 

In this case we obtain 

1 i ~2 

G(z) = Y - — =gJz ^— A (23) 
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where 



In case there are only few non-trivial Ao a 's, G(z) still satisfies a polynomial equation of order two or higher. Denby 
and Mallows S obtained similar results using a different method. 

One of the simple consequences of eqn. E3^ is the following. Consider a situation where there are only r non-zero 
singular values of M°, each of which is much bigger than the noise. Let the nonzero SV's be {Aoi, . . . , Ao r }- In the 
limit of zero noise, 

g w = — +EtV (25) 

In presense of finite small er, the pole at zero broadens into a cut close to the origin, the other cuts develop around 
the nontrivial singular values which are far away from the origin. 

Let us try to get the expression of G(z) around the origin. For z ~ a, ignoring terms of the order (&/\o a ) 2 for 
a < r, we find 

i ^ q — r 

G(z) ^ ; pa 2 pa 2 (^^) 



a=r+l * q-a 2 G(z) * l~a 2 G(z) 

which is just eqn. ( |l7|) with q replaced by q — r, a (but not a) remaining the same. Hence the smallest q — r singular 
values have the same distribution as if they have come from a pure noise matrix which is of a smaller size, namely 
p x (q — r). This result is useful in fitting the formula to the tail of the singular value spectrum in a real data matrix, 
and is used in the fit displayed in Fig. 3. 
Case 2) 

(Mia) = (27) 



{M ia M 3b ) = C t3 D ab (28) 

C and D being p x p and q x q matrices. Such correlations may arise in imaging data when there is spatial inhomogeneity 
in the variance or spatial correlations due to filtering of an underlying uncorrclatcd spatial noise distribution, and/or 
when there is temporal filtering of data. 
Here we consider 

^Hjrlwf) < 29 > 

Note that we did not take the trace, so that G is a matrix. 
We find that G satisfies the equation 

G(z) = 1 c (30) 

1-GTr{DG{z)) 

To the best of our knowledge, this is a new result. To see how to use it, let us consider two special cases, 
a) When the noise variance differs from point to point in space. 

(M ia M jb ) = a 2 a 8 l3 8 ab (31) 

In this case, 

Gaa(z) = 1 ^ ~ (32) 

which is a set of closed equations forG Qa (z),a = 1, • • • , q. 
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b)There are non-trivial temporal correlations, introduced for example by linear filtering of an underlying uncorre- 
cted process: 



Here 



(M ia M jb ) = C l3 5 ab (33) 



G{z) = TrG(z) = * c (34) 

z — 1 r-. 



l-CG(z) 



If the eigenvalues of C are af,i = !,■■• ,p, then 



G{z) = / g (35) 



How do we obtain the singular value spectrum from Eq.p5|? One way is to rewrite it as 

l— 1 cr^ 

We want to know G for real z. It is useful to first think of z as a function of G as defined in eqn.^ in the complex 
G plane. We now look for level sets of Im(z(G)). By tracing the appropriate branch of the curve Im(z(G)) — 0, one 
can solve for G(z) for real z. Taking the imaginary part of the function G(z) thus found gives the density of singular 
values. The cumulative density of states, or equivalently the sorted singular values, can be found by integrating the 
singular value density. 

Qualitative insight may be gained by realising that the real and the imaginary parts of z are the two components of 
the electric field in a 2-dimensional electrostatic problem, with a charge q at the origin, and point charges of strength 
— 1 placed at each of the points (1/crj, 0), i = 1, • • • ,p in the complex z plane. 

In addition to the density of singular values, one can try to compute the correlations between nearby singular 
values. It is well known in the theory of random matrices that the correlation functions of the eigenvalues of a random 
hermitian matrix has interesting universal features Q . This is true for eigenvalues chosen from a small enough region, 
so that the eigenvalue density in that region is more or less constant. We find that the correlations of the singular- 
values of a matrix, having each matrix element iid distributed with mean zero and variance tr 2 , are in the same 
universality class as the Gaussian Unitary Ensemble. The probability density p(AX) of level spacings AA goes as 
AA 2 for AA << AA. The probability density of s = AA/AA for the Gaussian Unitary Ensemble is well known in the 
Random Matrix literature [Q . It is possible that empirical level-spacing statistics can be used as a diagnostic to find 
out which singular values correspond mostly to 'noise' and which correspond mostly to 'signal'. 

So far, we have discussed what happens to the singular values. We would also like to estimate the errors made in 
reconstructing the matrix by keeping a small number terms in the left hand side of [l] which correspond to the biggest 
singular values. If we keep too few terms, we lose part of the signal. If we keep too many, we introduce back the 
noise. It would be useful to have expressions of the bias and the variance of the reconstruction. Unfortunately we 
do not have a simple extension of previous techniques to these calculations. Instead we compute these quanities for 
small sigma by doing a perturbative expansion. 

Let us go back to case 1), namely when each element of the matrix M is independently distributed with same 

Let 



MP =<M ia >=J2*o b u?*v° b (37) 



and 



M ia = hufv b a (38) 



b 

,b*„,b 



We would like to calculate the mean and the variance of the variable Mi a = J2bcs ^bUi*v a where S is a subset of 

{i -/!• 

For small tr, 



G 



< M ia > = E 

bes L 



MbU t v a + 2\ ob a 

c^b 



Aoc 



-2a 2 \ob 



\2 

A 0c 



(\2 \2 \2 s « 
c#b \\b ~ A 0cJ 



+ o(a 4 ) 



(39) 



var(Mi a ) 



^ 2 E 

&6S 



2 + ^E 



(Aq& + Aq 7 ) 



06 



Oi* 06 1 2 



+4,E 

c^6 



(Apfc + ^lc) 
2 ^,2 



(Ag b 



Aq c ) 



= E 

cGS,c/6 



(AofeA 0e ) | Qe* 06|2 



(Ag b 



\2 ^,2 I 



o(a 4 ) 



(40) 



In this expression j runs from 1 to p with Xqj = 0, for/ > q. 

To illustrate the utility of these results, we consider the SV distribution obtained from a space-time data set 
consisting of functional Magnetic Resonance Images (fMRI). The experimental details regarding the chosen data set 
can be found in For our purposes, the data constitutes a 1724 x 550 matrix. The longer dimension corresponds to 
a subset of the pixels in a 64 x 64 spatial image obtaining by discarding pixels which have intensity below a selected 
threshhold. 
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FIG. 3. 



1 

100 200 300 400 500 
Ordinal Number 

Comparison of singular values from a SVD of an fMRI data set with the theoretical formula for a noise only matrix. 



In Fig. 3, the tail of the SV distribution from this data is displayed along with a fit to a theoretical curve obtained 
from cqn.( |19|) . The distribution has two adjustable parameters. One of them is the variance a. A second adjustable 
parameter in the fit is the rank of the original matrix, which in this case has been assumed to be 60. We mentioned 
before that the effect of a few big singular values coming from the signal on the smaller singular values coming from 
the noise is an effective reduction of the dimensionality of the noise matrix. We fit the tail to the singular values of a 
1724 x (550—60) pure-noise matrix. In fact, in the present case the uncorrelated noise can be estimated independently, 
and is therefore not really a free fitting parameter. We found that the fitted value of a is in close correspondence with 
the independently estimated value of the noise variance (data not shown). 

In the example above, the good fit obtained between the theoretical curve and the tail in the SV distribution 
indicates that the noise entries in the original data were uniform and uncorrelated. It is easy to find experimental 
data where these conditions are violated, for example optical measurements of electrical activity in brain tissue |^| 
where the illumination is not fully uniform and the shot noise varies from point to point in space. Alternatively, 
the data may be spatially filtered and correlations may be introduced in space but not in time. Both of these cases 
produce SV distributions that cannot be fit by the procedure described above, but may be understood using eqn. 
( |32l ) . Details of these applications will be published elsewhere. 
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